Measuring Structural Similarity Among Web
نویسندگان
چکیده
When we describe a Web page informally, we often use phrases like \it looks like a newspaper site", \there are several unordered lists" or \it's just a collection of links". Unfortunately, no Web search or classi cation tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure, of the Web page. In this paper, we take a look at the concept of structurally similar Web pages. We note that some structural properties can be identi ed with semantic properties of the data and provide measures for comparison between HTML documents.
منابع مشابه
A Novel Approach to Measuring Structural Similarity between XML Documents
Measuring structural similarity between XML documents has become a key component in various applications, including XML mining, schema matching, and web service discovery, among others. This paper presents a novel structural similarity measure incorporating kernel methods into XML documents. Results on preliminary simulations show that this approach outperforms conventional ones.
متن کاملMeasuring the Structural Similarity of Web-based Documents: A Novel Approach
Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...
متن کاملMeasuring Structural Similarity Among Web Documents: Preliminary Results
When we describe a Web page informally, we often use phrases like \it looks like a newspaper site", \there are several unordered lists" or \it's just a collection of links". Unfortunately, no Web search or classiication tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure , of the Web page. In this paper, we take...
متن کاملMicrosoft Word - CONTENTS-AUGUST07
Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...
متن کاملA Statistical Model for Measuring Structural Similarity between Webpages
This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998